Overview

Dataset statistics

Number of variables25
Number of observations4185749
Missing cells39486179
Missing cells (%)37.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory4.5 GiB
Average record size in memory1.1 KiB

Variable types

Numeric4
DateTime2
Text9
Categorical10

Alerts

DRIVER_LICENSE_STATUS is highly imbalanced (86.5%)Imbalance
VEHICLE_DAMAGE_3 is highly imbalanced (53.9%)Imbalance
PUBLIC_PROPERTY_DAMAGE is highly imbalanced (63.5%)Imbalance
STATE_REGISTRATION has 305429 (7.3%) missing valuesMissing
VEHICLE_TYPE has 237271 (5.7%) missing valuesMissing
VEHICLE_MAKE has 1881778 (45.0%) missing valuesMissing
VEHICLE_MODEL has 4134369 (98.8%) missing valuesMissing
VEHICLE_YEAR has 1901490 (45.4%) missing valuesMissing
TRAVEL_DIRECTION has 1668118 (39.9%) missing valuesMissing
VEHICLE_OCCUPANTS has 1782928 (42.6%) missing valuesMissing
DRIVER_SEX has 2221537 (53.1%) missing valuesMissing
DRIVER_LICENSE_STATUS has 2310803 (55.2%) missing valuesMissing
DRIVER_LICENSE_JURISDICTION has 2306176 (55.1%) missing valuesMissing
PRE_CRASH has 921425 (22.0%) missing valuesMissing
POINT_OF_IMPACT has 1701246 (40.6%) missing valuesMissing
VEHICLE_DAMAGE has 1725730 (41.2%) missing valuesMissing
VEHICLE_DAMAGE_1 has 2601039 (62.1%) missing valuesMissing
VEHICLE_DAMAGE_2 has 2991845 (71.5%) missing valuesMissing
VEHICLE_DAMAGE_3 has 3270248 (78.1%) missing valuesMissing
PUBLIC_PROPERTY_DAMAGE has 1528858 (36.5%) missing valuesMissing
PUBLIC_PROPERTY_DAMAGE_TYPE has 4159532 (99.4%) missing valuesMissing
CONTRIBUTING_FACTOR_1 has 148303 (3.5%) missing valuesMissing
CONTRIBUTING_FACTOR_2 has 1688054 (40.3%) missing valuesMissing
VEHICLE_YEAR is highly skewed (γ1 = 55.36312215)Skewed
VEHICLE_OCCUPANTS is highly skewed (γ1 = 1088.274577)Skewed
UNIQUE_ID has unique valuesUnique
VEHICLE_OCCUPANTS has 412632 (9.9%) zerosZeros

Reproduction

Analysis started2024-05-07 03:20:16.540189
Analysis finished2024-05-07 03:22:46.485263
Duration2 minutes and 29.95 seconds
Software versionydata-profiling vv4.7.0
Download configurationconfig.json

Variables

UNIQUE_ID
Real number (ℝ)

UNIQUE 

Distinct4185749
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean16558039
Minimum111711
Maximum20645072
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size31.9 MiB
2024-05-06T23:22:46.532724image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum111711
5-th percentile9673053.4
Q114562233
median17550710
Q319117378
95-th percentile20417546
Maximum20645072
Range20533361
Interquartile range (IQR)4555145

Descriptive statistics

Standard deviation3350117
Coefficient of variation (CV)0.2023257
Kurtosis-0.38117072
Mean16558039
Median Absolute Deviation (MAD)2475734
Skewness-0.80951018
Sum6.9307797 × 1013
Variance1.1223284 × 1013
MonotonicityNot monotonic
2024-05-06T23:22:46.585399image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10385780 1
 
< 0.1%
19089622 1
 
< 0.1%
19016024 1
 
< 0.1%
17681000 1
 
< 0.1%
17620013 1
 
< 0.1%
17133151 1
 
< 0.1%
17632013 1
 
< 0.1%
17573075 1
 
< 0.1%
18705606 1
 
< 0.1%
18952545 1
 
< 0.1%
Other values (4185739) 4185739
> 99.9%
ValueCountFrequency (%)
111711 1
< 0.1%
111712 1
< 0.1%
115530 1
< 0.1%
115531 1
< 0.1%
120620 1
< 0.1%
123422 1
< 0.1%
123423 1
< 0.1%
199289 1
< 0.1%
199290 1
< 0.1%
199291 1
< 0.1%
ValueCountFrequency (%)
20645072 1
< 0.1%
20645071 1
< 0.1%
20645049 1
< 0.1%
20645048 1
< 0.1%
20645047 1
< 0.1%
20645040 1
< 0.1%
20645039 1
< 0.1%
20645038 1
< 0.1%
20645037 1
< 0.1%
20645036 1
< 0.1%

COLLISION_ID
Real number (ℝ)

Distinct2083567
Distinct (%)49.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3181329.3
Minimum22
Maximum4722272
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size31.9 MiB
2024-05-06T23:22:46.641383image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum22
5-th percentile108378.4
Q13163427
median3687323
Q34207061
95-th percentile4618128.6
Maximum4722272
Range4722250
Interquartile range (IQR)1043634

Descriptive statistics

Standard deviation1497302
Coefficient of variation (CV)0.47065295
Kurtosis0.041194458
Mean3181329.3
Median Absolute Deviation (MAD)521814
Skewness-1.2467723
Sum1.3316246 × 1013
Variance2.2419134 × 1012
MonotonicityNot monotonic
2024-05-06T23:22:46.692402image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4691158 42
 
< 0.1%
4539133 40
 
< 0.1%
4275782 25
 
< 0.1%
3925685 22
 
< 0.1%
4541337 22
 
< 0.1%
4324675 22
 
< 0.1%
306480 21
 
< 0.1%
4625450 20
 
< 0.1%
4578189 19
 
< 0.1%
3187017 19
 
< 0.1%
Other values (2083557) 4185497
> 99.9%
ValueCountFrequency (%)
22 2
< 0.1%
23 2
< 0.1%
24 2
< 0.1%
25 2
< 0.1%
26 2
< 0.1%
27 2
< 0.1%
28 2
< 0.1%
29 2
< 0.1%
30 2
< 0.1%
31 2
< 0.1%
ValueCountFrequency (%)
4722272 2
< 0.1%
4722270 2
< 0.1%
4722268 2
< 0.1%
4722265 1
< 0.1%
4722264 2
< 0.1%
4722263 2
< 0.1%
4722260 2
< 0.1%
4722259 1
< 0.1%
4722254 2
< 0.1%
4722253 2
< 0.1%
Distinct4325
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size31.9 MiB
Minimum2012-07-01 00:00:00
Maximum2024-05-03 00:00:00
2024-05-06T23:22:46.741538image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:22:46.797831image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct1440
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size31.9 MiB
Minimum2024-05-06 00:00:00
Maximum2024-05-06 23:59:00
2024-05-06T23:22:46.854268image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:22:46.908163image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct2656924
Distinct (%)63.5%
Missing0
Missing (%)0.0%
Memory size309.5 MiB
2024-05-06T23:22:47.856117image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length36
Median length36
Mean length20.543244
Min length1

Characters and Unicode

Total characters85988861
Distinct characters17
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2656905 ?
Unique (%)63.5%

Sample

1st row1
2nd row0553ab4d-9500-4cba-8d98-f4d7f89d5856
3rd row2
4th row1
5th row1
ValueCountFrequency (%)
1 769061
 
18.4%
2 694883
 
16.6%
3 50530
 
1.2%
4 10398
 
0.2%
5 2608
 
0.1%
6 791
 
< 0.1%
7 281
 
< 0.1%
8 130
 
< 0.1%
9 69
 
< 0.1%
10 36
 
< 0.1%
Other values (2656914) 2656962
63.5%
2024-05-06T23:22:48.743638image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 9127904
 
10.6%
4 6791276
 
7.9%
1 5374630
 
6.3%
2 5213477
 
6.1%
8 5067349
 
5.9%
9 5064080
 
5.9%
b 4853328
 
5.6%
a 4845628
 
5.6%
3 4552258
 
5.3%
5 4501708
 
5.2%
Other values (7) 30597223
35.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 85988861
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
- 9127904
 
10.6%
4 6791276
 
7.9%
1 5374630
 
6.3%
2 5213477
 
6.1%
8 5067349
 
5.9%
9 5064080
 
5.9%
b 4853328
 
5.6%
a 4845628
 
5.6%
3 4552258
 
5.3%
5 4501708
 
5.2%
Other values (7) 30597223
35.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 85988861
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
- 9127904
 
10.6%
4 6791276
 
7.9%
1 5374630
 
6.3%
2 5213477
 
6.1%
8 5067349
 
5.9%
9 5064080
 
5.9%
b 4853328
 
5.6%
a 4845628
 
5.6%
3 4552258
 
5.3%
5 4501708
 
5.2%
Other values (7) 30597223
35.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 85988861
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
- 9127904
 
10.6%
4 6791276
 
7.9%
1 5374630
 
6.3%
2 5213477
 
6.1%
8 5067349
 
5.9%
9 5064080
 
5.9%
b 4853328
 
5.6%
a 4845628
 
5.6%
3 4552258
 
5.3%
5 4501708
 
5.2%
Other values (7) 30597223
35.6%

STATE_REGISTRATION
Text

MISSING 

Distinct82
Distinct (%)< 0.1%
Missing305429
Missing (%)7.3%
Memory size227.7 MiB
2024-05-06T23:22:48.809348image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length2
Median length2
Mean length1.9999997
Min length1

Characters and Unicode

Total characters7760639
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st rowNY
2nd rowNY
3rd rowNY
4th rowNY
5th rowNY
ValueCountFrequency (%)
ny 3232331
83.3%
nj 236709
 
6.1%
pa 86926
 
2.2%
fl 47820
 
1.2%
ct 43817
 
1.1%
va 19244
 
0.5%
ma 18362
 
0.5%
md 18150
 
0.5%
nc 17039
 
0.4%
ga 14182
 
0.4%
Other values (72) 145740
 
3.8%
2024-05-06T23:22:48.917370image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
N 3513550
45.3%
Y 3233546
41.7%
J 236709
 
3.1%
A 160691
 
2.1%
P 87944
 
1.1%
C 76227
 
1.0%
T 65899
 
0.8%
L 62048
 
0.8%
M 49564
 
0.6%
F 48230
 
0.6%
Other values (16) 226231
 
2.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 7760639
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
N 3513550
45.3%
Y 3233546
41.7%
J 236709
 
3.1%
A 160691
 
2.1%
P 87944
 
1.1%
C 76227
 
1.0%
T 65899
 
0.8%
L 62048
 
0.8%
M 49564
 
0.6%
F 48230
 
0.6%
Other values (16) 226231
 
2.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 7760639
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
N 3513550
45.3%
Y 3233546
41.7%
J 236709
 
3.1%
A 160691
 
2.1%
P 87944
 
1.1%
C 76227
 
1.0%
T 65899
 
0.8%
L 62048
 
0.8%
M 49564
 
0.6%
F 48230
 
0.6%
Other values (16) 226231
 
2.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 7760639
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
N 3513550
45.3%
Y 3233546
41.7%
J 236709
 
3.1%
A 160691
 
2.1%
P 87944
 
1.1%
C 76227
 
1.0%
T 65899
 
0.8%
L 62048
 
0.8%
M 49564
 
0.6%
F 48230
 
0.6%
Other values (16) 226231
 
2.9%

VEHICLE_TYPE
Text

MISSING 

Distinct2725
Distinct (%)0.1%
Missing237271
Missing (%)5.7%
Memory size284.3 MiB
2024-05-06T23:22:48.995155image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length38
Median length30
Mean length16.576971
Min length1

Characters and Unicode

Total characters65453806
Distinct characters75
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1652 ?
Unique (%)< 0.1%

Sample

1st rowPASSENGER VEHICLE
2nd rowStation Wagon/Sport Utility Vehicle
3rd rowTAXI
4th rowPASSENGER VEHICLE
5th rowPASSENGER VEHICLE
ValueCountFrequency (%)
vehicle 1626128
17.6%
utility 1173683
12.7%
station 1173610
12.7%
sedan 1126807
12.2%
wagon/sport 835682
9.1%
passenger 770775
8.4%
340727
 
3.7%
wagon 338047
 
3.7%
sport 337927
 
3.7%
truck 177854
 
1.9%
Other values (1445) 1311969
14.2%
2024-05-06T23:22:49.130392image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5264731
 
8.0%
S 5055296
 
7.7%
t 4247518
 
6.5%
i 3601749
 
5.5%
E 3408377
 
5.2%
e 2990202
 
4.6%
a 2975013
 
4.5%
n 2840400
 
4.3%
o 2668562
 
4.1%
T 2162950
 
3.3%
Other values (65) 30239008
46.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 65453806
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
5264731
 
8.0%
S 5055296
 
7.7%
t 4247518
 
6.5%
i 3601749
 
5.5%
E 3408377
 
5.2%
e 2990202
 
4.6%
a 2975013
 
4.5%
n 2840400
 
4.3%
o 2668562
 
4.1%
T 2162950
 
3.3%
Other values (65) 30239008
46.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 65453806
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
5264731
 
8.0%
S 5055296
 
7.7%
t 4247518
 
6.5%
i 3601749
 
5.5%
E 3408377
 
5.2%
e 2990202
 
4.6%
a 2975013
 
4.5%
n 2840400
 
4.3%
o 2668562
 
4.1%
T 2162950
 
3.3%
Other values (65) 30239008
46.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 65453806
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
5264731
 
8.0%
S 5055296
 
7.7%
t 4247518
 
6.5%
i 3601749
 
5.5%
E 3408377
 
5.2%
e 2990202
 
4.6%
a 2975013
 
4.5%
n 2840400
 
4.3%
o 2668562
 
4.1%
T 2162950
 
3.3%
Other values (65) 30239008
46.2%

VEHICLE_MAKE
Text

MISSING 

Distinct12874
Distinct (%)0.6%
Missing1881778
Missing (%)45.0%
Memory size210.6 MiB
2024-05-06T23:22:49.260860image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length52
Median length13
Mean length12.693347
Min length1

Characters and Unicode

Total characters29245103
Distinct characters80
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9000 ?
Unique (%)0.4%

Sample

1st rowTOYT -CAR/SUV
2nd rowMERZ -CAR/SUV
3rd rowFRHT-TRUCK/BUS
4th rowFORD -CAR/SUV
5th rowVOLK -CAR/SUV
ValueCountFrequency (%)
car/suv 2073789
46.9%
toyt 394420
 
8.9%
hond 286537
 
6.5%
niss 233661
 
5.3%
ford 200368
 
4.5%
chev 110723
 
2.5%
hyun 82015
 
1.9%
bmw 78101
 
1.8%
merz 76151
 
1.7%
jeep 75264
 
1.7%
Other values (6814) 812645
 
18.4%
2024-05-06T23:22:49.464842image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
S 2819603
9.6%
R 2667135
9.1%
U 2560537
 
8.8%
C 2525143
 
8.6%
A 2311261
 
7.9%
V 2260432
 
7.7%
- 2214931
 
7.6%
/ 2204446
 
7.5%
2119703
 
7.2%
O 1060624
 
3.6%
Other values (70) 6501288
22.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 29245103
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 2819603
9.6%
R 2667135
9.1%
U 2560537
 
8.8%
C 2525143
 
8.6%
A 2311261
 
7.9%
V 2260432
 
7.7%
- 2214931
 
7.6%
/ 2204446
 
7.5%
2119703
 
7.2%
O 1060624
 
3.6%
Other values (70) 6501288
22.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 29245103
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 2819603
9.6%
R 2667135
9.1%
U 2560537
 
8.8%
C 2525143
 
8.6%
A 2311261
 
7.9%
V 2260432
 
7.7%
- 2214931
 
7.6%
/ 2204446
 
7.5%
2119703
 
7.2%
O 1060624
 
3.6%
Other values (70) 6501288
22.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 29245103
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 2819603
9.6%
R 2667135
9.1%
U 2560537
 
8.8%
C 2525143
 
8.6%
A 2311261
 
7.9%
V 2260432
 
7.7%
- 2214931
 
7.6%
/ 2204446
 
7.5%
2119703
 
7.2%
O 1060624
 
3.6%
Other values (70) 6501288
22.2%

VEHICLE_MODEL
Text

MISSING 

Distinct2429
Distinct (%)4.7%
Missing4134369
Missing (%)98.8%
Memory size129.3 MiB
2024-05-06T23:22:49.611110image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length25
Median length8
Mean length7.5591086
Min length1

Characters and Unicode

Total characters388387
Distinct characters73
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1327 ?
Unique (%)2.6%

Sample

1st rowTOYT 4RN
2nd rowFORD ZZZ
3rd rowTRUCK TRADE
4th rowDODG CHA
5th rowtown and country
ValueCountFrequency (%)
zzz 9213
 
9.7%
toyt 8644
 
9.1%
hond 5999
 
6.3%
niss 5220
 
5.5%
ford 4930
 
5.2%
cam 3092
 
3.3%
chev 2681
 
2.8%
acc 1899
 
2.0%
hyun 1575
 
1.7%
alt 1532
 
1.6%
Other values (1769) 50052
52.8%
2024-05-06T23:22:49.797308image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
43457
 
11.2%
Z 32695
 
8.4%
T 27048
 
7.0%
O 25820
 
6.6%
C 22245
 
5.7%
N 21375
 
5.5%
S 18775
 
4.8%
A 17553
 
4.5%
D 17438
 
4.5%
R 16184
 
4.2%
Other values (63) 145797
37.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 388387
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
43457
 
11.2%
Z 32695
 
8.4%
T 27048
 
7.0%
O 25820
 
6.6%
C 22245
 
5.7%
N 21375
 
5.5%
S 18775
 
4.8%
A 17553
 
4.5%
D 17438
 
4.5%
R 16184
 
4.2%
Other values (63) 145797
37.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 388387
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
43457
 
11.2%
Z 32695
 
8.4%
T 27048
 
7.0%
O 25820
 
6.6%
C 22245
 
5.7%
N 21375
 
5.5%
S 18775
 
4.8%
A 17553
 
4.5%
D 17438
 
4.5%
R 16184
 
4.2%
Other values (63) 145797
37.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 388387
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
43457
 
11.2%
Z 32695
 
8.4%
T 27048
 
7.0%
O 25820
 
6.6%
C 22245
 
5.7%
N 21375
 
5.5%
S 18775
 
4.8%
A 17553
 
4.5%
D 17438
 
4.5%
R 16184
 
4.2%
Other values (63) 145797
37.5%

VEHICLE_YEAR
Real number (ℝ)

MISSING  SKEWED 

Distinct321
Distinct (%)< 0.1%
Missing1901490
Missing (%)45.4%
Infinite0
Infinite (%)0.0%
Mean2015.1493
Minimum1000
Maximum20063
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size31.9 MiB
2024-05-06T23:22:49.869052image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum1000
5-th percentile2001
Q12008
median2014
Q32017
95-th percentile2020
Maximum20063
Range19063
Interquartile range (IQR)9

Descriptive statistics

Standard deviation148.32851
Coefficient of variation (CV)0.073606707
Kurtosis3275.7836
Mean2015.1493
Median Absolute Deviation (MAD)3
Skewness55.363122
Sum4.603123 × 109
Variance22001.346
MonotonicityNot monotonic
2024-05-06T23:22:49.921234image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2016 221129
 
5.3%
2015 218791
 
5.2%
2017 200176
 
4.8%
2014 161252
 
3.9%
2013 137632
 
3.3%
2018 137172
 
3.3%
2012 109886
 
2.6%
2011 98656
 
2.4%
2019 96930
 
2.3%
2007 90282
 
2.2%
Other values (311) 812353
19.4%
(Missing) 1901490
45.4%
ValueCountFrequency (%)
1000 1
 
< 0.1%
1111 2
 
< 0.1%
1900 7
< 0.1%
1920 2
 
< 0.1%
1921 1
 
< 0.1%
1923 1
 
< 0.1%
1926 1
 
< 0.1%
1930 1
 
< 0.1%
1931 1
 
< 0.1%
1932 1
 
< 0.1%
ValueCountFrequency (%)
20063 1
 
< 0.1%
20015 2
 
< 0.1%
20009 1
 
< 0.1%
20003 1
 
< 0.1%
19969 1
 
< 0.1%
9999 728
< 0.1%
9972 1
 
< 0.1%
9699 1
 
< 0.1%
9019 1
 
< 0.1%
8888 1
 
< 0.1%

TRAVEL_DIRECTION
Categorical

MISSING 

Distinct15
Distinct (%)< 0.1%
Missing1668118
Missing (%)39.9%
Memory size250.2 MiB
West
578302 
East
576123 
North
575683 
South
569823 
Unknown
83866 
Other values (10)
133834 

Length

Max length9
Median length7
Mean length4.8151703
Min length1

Characters and Unicode

Total characters12122822
Distinct characters17
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNorth
2nd rowEast
3rd rowEast
4th rowSouthwest
5th rowSouth

Common Values

ValueCountFrequency (%)
West 578302
 
13.8%
East 576123
 
13.8%
North 575683
 
13.8%
South 569823
 
13.6%
Unknown 83866
 
2.0%
Northeast 35085
 
0.8%
Southeast 33374
 
0.8%
Southwest 32658
 
0.8%
Northwest 30970
 
0.7%
- 1003
 
< 0.1%
Other values (5) 744
 
< 0.1%
(Missing) 1668118
39.9%

Length

2024-05-06T23:22:49.976030image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
west 578302
23.0%
east 576123
22.9%
north 575683
22.9%
south 569823
22.6%
unknown 83866
 
3.3%
northeast 35085
 
1.4%
southeast 33374
 
1.3%
southwest 32658
 
1.3%
northwest 30970
 
1.2%
1003
 
< 0.1%
Other values (5) 744
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
t 2564105
21.2%
o 1361459
11.2%
s 1286512
10.6%
h 1277593
10.5%
e 710389
 
5.9%
a 644582
 
5.3%
N 641915
 
5.3%
r 641738
 
5.3%
S 636061
 
5.2%
u 635855
 
5.2%
Other values (7) 1722613
14.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 12122822
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 2564105
21.2%
o 1361459
11.2%
s 1286512
10.6%
h 1277593
10.5%
e 710389
 
5.9%
a 644582
 
5.3%
N 641915
 
5.3%
r 641738
 
5.3%
S 636061
 
5.2%
u 635855
 
5.2%
Other values (7) 1722613
14.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 12122822
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 2564105
21.2%
o 1361459
11.2%
s 1286512
10.6%
h 1277593
10.5%
e 710389
 
5.9%
a 644582
 
5.3%
N 641915
 
5.3%
r 641738
 
5.3%
S 636061
 
5.2%
u 635855
 
5.2%
Other values (7) 1722613
14.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 12122822
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 2564105
21.2%
o 1361459
11.2%
s 1286512
10.6%
h 1277593
10.5%
e 710389
 
5.9%
a 644582
 
5.3%
N 641915
 
5.3%
r 641738
 
5.3%
S 636061
 
5.2%
u 635855
 
5.2%
Other values (7) 1722613
14.2%

VEHICLE_OCCUPANTS
Real number (ℝ)

MISSING  SKEWED  ZEROS 

Distinct133
Distinct (%)< 0.1%
Missing1782928
Missing (%)42.6%
Infinite0
Infinite (%)0.0%
Mean879.64076
Minimum0
Maximum1 × 109
Zeros412632
Zeros (%)9.9%
Negative0
Negative (%)0.0%
Memory size31.9 MiB
2024-05-06T23:22:50.026938image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median1
Q31
95-th percentile3
Maximum1 × 109
Range1 × 109
Interquartile range (IQR)0

Descriptive statistics

Standard deviation906508.38
Coefficient of variation (CV)1030.5439
Kurtosis1189444.3
Mean879.64076
Median Absolute Deviation (MAD)0
Skewness1088.2746
Sum2.1136193 × 109
Variance8.2175744 × 1011
MonotonicityNot monotonic
2024-05-06T23:22:50.081849image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1515162
36.2%
0 412632
 
9.9%
2 318900
 
7.6%
3 91605
 
2.2%
4 38149
 
0.9%
5 13947
 
0.3%
6 4584
 
0.1%
7 2028
 
< 0.1%
8 1203
 
< 0.1%
9 843
 
< 0.1%
Other values (123) 3768
 
0.1%
(Missing) 1782928
42.6%
ValueCountFrequency (%)
0 412632
 
9.9%
1 1515162
36.2%
2 318900
 
7.6%
3 91605
 
2.2%
4 38149
 
0.9%
5 13947
 
0.3%
6 4584
 
0.1%
7 2028
 
< 0.1%
8 1203
 
< 0.1%
9 843
 
< 0.1%
ValueCountFrequency (%)
999999999 1
 
< 0.1%
981990849 1
 
< 0.1%
99999999 1
 
< 0.1%
9999999 2
 
< 0.1%
5292023 1
 
< 0.1%
999999 3
 
< 0.1%
99999 3
 
< 0.1%
24260 1
 
< 0.1%
9999 16
< 0.1%
2017 2
 
< 0.1%

DRIVER_SEX
Categorical

MISSING 

Distinct3
Distinct (%)< 0.1%
Missing2221537
Missing (%)53.1%
Memory size244.2 MiB
M
1453033 
F
502957 
U
 
8222

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1964212
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowM
3rd rowM
4th rowF
5th rowM

Common Values

ValueCountFrequency (%)
M 1453033
34.7%
F 502957
 
12.0%
U 8222
 
0.2%
(Missing) 2221537
53.1%

Length

2024-05-06T23:22:50.127880image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-06T23:22:50.167888image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
m 1453033
74.0%
f 502957
 
25.6%
u 8222
 
0.4%

Most occurring characters

ValueCountFrequency (%)
M 1453033
74.0%
F 502957
 
25.6%
U 8222
 
0.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1964212
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
M 1453033
74.0%
F 502957
 
25.6%
U 8222
 
0.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1964212
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
M 1453033
74.0%
F 502957
 
25.6%
U 8222
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1964212
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
M 1453033
74.0%
F 502957
 
25.6%
U 8222
 
0.4%

DRIVER_LICENSE_STATUS
Categorical

IMBALANCE  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing2310803
Missing (%)55.2%
Memory size257.3 MiB
Licensed
1820880 
Unlicensed
 
37378
Permit
 
16688

Length

Max length10
Median length8
Mean length8.02207
Min length6

Characters and Unicode

Total characters15040948
Distinct characters13
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLicensed
2nd rowLicensed
3rd rowLicensed
4th rowLicensed
5th rowLicensed

Common Values

ValueCountFrequency (%)
Licensed 1820880
43.5%
Unlicensed 37378
 
0.9%
Permit 16688
 
0.4%
(Missing) 2310803
55.2%

Length

2024-05-06T23:22:50.214089image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-06T23:22:50.255545image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
licensed 1820880
97.1%
unlicensed 37378
 
2.0%
permit 16688
 
0.9%

Most occurring characters

ValueCountFrequency (%)
e 3733204
24.8%
n 1895636
12.6%
i 1874946
12.5%
c 1858258
12.4%
s 1858258
12.4%
d 1858258
12.4%
L 1820880
12.1%
U 37378
 
0.2%
l 37378
 
0.2%
P 16688
 
0.1%
Other values (3) 50064
 
0.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 15040948
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 3733204
24.8%
n 1895636
12.6%
i 1874946
12.5%
c 1858258
12.4%
s 1858258
12.4%
d 1858258
12.4%
L 1820880
12.1%
U 37378
 
0.2%
l 37378
 
0.2%
P 16688
 
0.1%
Other values (3) 50064
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 15040948
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 3733204
24.8%
n 1895636
12.6%
i 1874946
12.5%
c 1858258
12.4%
s 1858258
12.4%
d 1858258
12.4%
L 1820880
12.1%
U 37378
 
0.2%
l 37378
 
0.2%
P 16688
 
0.1%
Other values (3) 50064
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 15040948
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 3733204
24.8%
n 1895636
12.6%
i 1874946
12.5%
c 1858258
12.4%
s 1858258
12.4%
d 1858258
12.4%
L 1820880
12.1%
U 37378
 
0.2%
l 37378
 
0.2%
P 16688
 
0.1%
Other values (3) 50064
 
0.3%
Distinct72
Distinct (%)< 0.1%
Missing2306176
Missing (%)55.1%
Memory size176.1 MiB
2024-05-06T23:22:50.305310image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length8
Median length2
Mean length2.0027288
Min length2

Characters and Unicode

Total characters3764275
Distinct characters30
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)< 0.1%

Sample

1st rowNY
2nd rowFL
3rd rowNY
4th rowNY
5th rowNY
ValueCountFrequency (%)
ny 1619478
86.2%
nj 106707
 
5.7%
pa 33393
 
1.8%
ct 20922
 
1.1%
fl 19940
 
1.1%
md 10792
 
0.6%
nc 6626
 
0.4%
ma 6308
 
0.3%
ga 6221
 
0.3%
va 6180
 
0.3%
Other values (61) 43006
 
2.3%
2024-05-06T23:22:50.416792image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
N 1739716
46.2%
Y 1619873
43.0%
J 106708
 
2.8%
A 62189
 
1.7%
C 35669
 
0.9%
P 34039
 
0.9%
T 25654
 
0.7%
L 23016
 
0.6%
M 21899
 
0.6%
F 20260
 
0.5%
Other values (20) 75252
 
2.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3764275
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
N 1739716
46.2%
Y 1619873
43.0%
J 106708
 
2.8%
A 62189
 
1.7%
C 35669
 
0.9%
P 34039
 
0.9%
T 25654
 
0.7%
L 23016
 
0.6%
M 21899
 
0.6%
F 20260
 
0.5%
Other values (20) 75252
 
2.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3764275
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
N 1739716
46.2%
Y 1619873
43.0%
J 106708
 
2.8%
A 62189
 
1.7%
C 35669
 
0.9%
P 34039
 
0.9%
T 25654
 
0.7%
L 23016
 
0.6%
M 21899
 
0.6%
F 20260
 
0.5%
Other values (20) 75252
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3764275
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
N 1739716
46.2%
Y 1619873
43.0%
J 106708
 
2.8%
A 62189
 
1.7%
C 35669
 
0.9%
P 34039
 
0.9%
T 25654
 
0.7%
L 23016
 
0.6%
M 21899
 
0.6%
F 20260
 
0.5%
Other values (20) 75252
 
2.0%

PRE_CRASH
Categorical

MISSING 

Distinct19
Distinct (%)< 0.1%
Missing921425
Missing (%)22.0%
Memory size283.4 MiB
Going Straight Ahead
1598318 
Parked
561177 
Making Left Turn
201185 
Making Right Turn
165325 
Stopped in Traffic
 
150370
Other values (14)
587949 

Length

Max length26
Median length24
Mean length15.956158
Min length6

Characters and Unicode

Total characters52086070
Distinct characters38
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGoing Straight Ahead
2nd rowGoing Straight Ahead
3rd rowParked
4th rowMerging
5th rowParked

Common Values

ValueCountFrequency (%)
Going Straight Ahead 1598318
38.2%
Parked 561177
 
13.4%
Making Left Turn 201185
 
4.8%
Making Right Turn 165325
 
3.9%
Stopped in Traffic 150370
 
3.6%
Slowing or Stopping 115597
 
2.8%
Backing 111297
 
2.7%
Changing Lanes 95659
 
2.3%
Starting from Parking 53394
 
1.3%
Merging 53182
 
1.3%
Other values (9) 158820
 
3.8%
(Missing) 921425
22.0%

Length

2024-05-06T23:22:50.481743image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
going 1598318
19.7%
straight 1598318
19.7%
ahead 1598318
19.7%
parked 602057
 
7.4%
making 397088
 
4.9%
turn 397088
 
4.9%
left 202187
 
2.5%
in 166924
 
2.1%
right 166218
 
2.0%
traffic 163363
 
2.0%
Other values (23) 1223246
15.1%

Most occurring characters

ValueCountFrequency (%)
i 4868299
 
9.3%
4848801
 
9.3%
a 4823079
 
9.3%
g 4598754
 
8.8%
t 4085494
 
7.8%
n 3524362
 
6.8%
h 3493515
 
6.7%
r 3180051
 
6.1%
e 2784500
 
5.3%
d 2359762
 
4.5%
Other values (28) 13519453
26.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 52086070
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
i 4868299
 
9.3%
4848801
 
9.3%
a 4823079
 
9.3%
g 4598754
 
8.8%
t 4085494
 
7.8%
n 3524362
 
6.8%
h 3493515
 
6.7%
r 3180051
 
6.1%
e 2784500
 
5.3%
d 2359762
 
4.5%
Other values (28) 13519453
26.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 52086070
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
i 4868299
 
9.3%
4848801
 
9.3%
a 4823079
 
9.3%
g 4598754
 
8.8%
t 4085494
 
7.8%
n 3524362
 
6.8%
h 3493515
 
6.7%
r 3180051
 
6.1%
e 2784500
 
5.3%
d 2359762
 
4.5%
Other values (28) 13519453
26.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 52086070
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
i 4868299
 
9.3%
4848801
 
9.3%
a 4823079
 
9.3%
g 4598754
 
8.8%
t 4085494
 
7.8%
n 3524362
 
6.8%
h 3493515
 
6.7%
r 3180051
 
6.1%
e 2784500
 
5.3%
d 2359762
 
4.5%
Other values (28) 13519453
26.0%

POINT_OF_IMPACT
Categorical

MISSING 

Distinct19
Distinct (%)< 0.1%
Missing1701246
Missing (%)40.6%
Memory size281.0 MiB
Center Front End
429564 
Left Front Bumper
314480 
Center Back End
299492 
Right Front Bumper
277632 
Right Front Quarter Panel
177706 
Other values (14)
985629 

Length

Max length25
Median length23
Mean length17.77199
Min length4

Characters and Unicode

Total characters44154563
Distinct characters34
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLeft Front Bumper
2nd rowRight Front Bumper
3rd rowLeft Front Quarter Panel
4th rowCenter Front End
5th rowRight Rear Bumper

Common Values

ValueCountFrequency (%)
Center Front End 429564
 
10.3%
Left Front Bumper 314480
 
7.5%
Center Back End 299492
 
7.2%
Right Front Bumper 277632
 
6.6%
Right Front Quarter Panel 177706
 
4.2%
Left Front Quarter Panel 175025
 
4.2%
Left Rear Quarter Panel 142767
 
3.4%
Left Side Doors 131452
 
3.1%
Left Rear Bumper 130652
 
3.1%
Right Side Doors 109876
 
2.6%
Other values (9) 295857
 
7.1%
(Missing) 1701246
40.6%

Length

2024-05-06T23:22:50.769592image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
front 1374407
17.4%
left 894376
11.3%
bumper 809576
10.3%
right 753628
9.6%
center 729056
9.2%
end 729056
9.2%
quarter 597100
7.6%
panel 597100
7.6%
rear 461833
 
5.9%
back 299492
 
3.8%
Other values (10) 643797
8.2%

Most occurring characters

ValueCountFrequency (%)
5404918
12.2%
e 5166730
11.7%
r 4866822
 
11.0%
t 4390163
 
9.9%
n 3432995
 
7.8%
a 2070255
 
4.7%
o 1922154
 
4.4%
u 1408443
 
3.2%
F 1374407
 
3.1%
R 1220430
 
2.8%
Other values (24) 12897246
29.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 44154563
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
5404918
12.2%
e 5166730
11.7%
r 4866822
 
11.0%
t 4390163
 
9.9%
n 3432995
 
7.8%
a 2070255
 
4.7%
o 1922154
 
4.4%
u 1408443
 
3.2%
F 1374407
 
3.1%
R 1220430
 
2.8%
Other values (24) 12897246
29.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 44154563
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
5404918
12.2%
e 5166730
11.7%
r 4866822
 
11.0%
t 4390163
 
9.9%
n 3432995
 
7.8%
a 2070255
 
4.7%
o 1922154
 
4.4%
u 1408443
 
3.2%
F 1374407
 
3.1%
R 1220430
 
2.8%
Other values (24) 12897246
29.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 44154563
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
5404918
12.2%
e 5166730
11.7%
r 4866822
 
11.0%
t 4390163
 
9.9%
n 3432995
 
7.8%
a 2070255
 
4.7%
o 1922154
 
4.4%
u 1408443
 
3.2%
F 1374407
 
3.1%
R 1220430
 
2.8%
Other values (24) 12897246
29.2%

VEHICLE_DAMAGE
Categorical

MISSING 

Distinct19
Distinct (%)< 0.1%
Missing1725730
Missing (%)41.2%
Memory size279.2 MiB
Center Front End
382597 
Left Front Bumper
259182 
Center Back End
255987 
Right Front Bumper
238256 
No Damage
232503 
Other values (14)
1091494 

Length

Max length25
Median length23
Mean length17.114411
Min length4

Characters and Unicode

Total characters42101776
Distinct characters34
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLeft Front Quarter Panel
2nd rowRight Front Bumper
3rd rowLeft Front Quarter Panel
4th rowCenter Front End
5th rowRight Rear Bumper

Common Values

ValueCountFrequency (%)
Center Front End 382597
 
9.1%
Left Front Bumper 259182
 
6.2%
Center Back End 255987
 
6.1%
Right Front Bumper 238256
 
5.7%
No Damage 232503
 
5.6%
Left Front Quarter Panel 171577
 
4.1%
Right Front Quarter Panel 166860
 
4.0%
Left Rear Quarter Panel 136235
 
3.3%
Left Side Doors 135595
 
3.2%
Left Rear Bumper 125079
 
3.0%
Other values (9) 356148
 
8.5%
(Missing) 1725730
41.2%

Length

2024-05-06T23:22:50.815792image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
front 1218472
16.1%
left 827668
10.9%
bumper 709483
9.3%
right 698601
9.2%
center 638584
8.4%
end 638584
8.4%
quarter 568019
7.5%
panel 568019
7.5%
rear 441627
 
5.8%
back 255987
 
3.4%
Other values (10) 1025203
13.5%

Most occurring characters

ValueCountFrequency (%)
5130228
12.2%
e 4940087
 
11.7%
r 4456135
 
10.6%
t 4000089
 
9.5%
n 3068608
 
7.3%
a 2305580
 
5.5%
o 1962673
 
4.7%
u 1280264
 
3.0%
F 1218472
 
2.9%
R 1145209
 
2.7%
Other values (24) 12594431
29.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 42101776
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
5130228
12.2%
e 4940087
 
11.7%
r 4456135
 
10.6%
t 4000089
 
9.5%
n 3068608
 
7.3%
a 2305580
 
5.5%
o 1962673
 
4.7%
u 1280264
 
3.0%
F 1218472
 
2.9%
R 1145209
 
2.7%
Other values (24) 12594431
29.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 42101776
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
5130228
12.2%
e 4940087
 
11.7%
r 4456135
 
10.6%
t 4000089
 
9.5%
n 3068608
 
7.3%
a 2305580
 
5.5%
o 1962673
 
4.7%
u 1280264
 
3.0%
F 1218472
 
2.9%
R 1145209
 
2.7%
Other values (24) 12594431
29.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 42101776
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
5130228
12.2%
e 4940087
 
11.7%
r 4456135
 
10.6%
t 4000089
 
9.5%
n 3068608
 
7.3%
a 2305580
 
5.5%
o 1962673
 
4.7%
u 1280264
 
3.0%
F 1218472
 
2.9%
R 1145209
 
2.7%
Other values (24) 12594431
29.9%

VEHICLE_DAMAGE_1
Categorical

MISSING 

Distinct19
Distinct (%)< 0.1%
Missing2601039
Missing (%)62.1%
Memory size268.7 MiB
No Damage
437596 
Left Front Bumper
158265 
Center Front End
149772 
Right Front Bumper
126608 
Left Front Quarter Panel
100756 
Other values (14)
611713 

Length

Max length25
Median length23
Mean length15.71644
Min length4

Characters and Unicode

Total characters24906000
Distinct characters34
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRight Front Quarter Panel
2nd rowNo Damage
3rd rowCenter Back End
4th rowLeft Rear Quarter Panel
5th rowRight Front Quarter Panel

Common Values

ValueCountFrequency (%)
No Damage 437596
 
10.5%
Left Front Bumper 158265
 
3.8%
Center Front End 149772
 
3.6%
Right Front Bumper 126608
 
3.0%
Left Front Quarter Panel 100756
 
2.4%
Right Front Quarter Panel 92310
 
2.2%
Left Rear Bumper 83214
 
2.0%
Right Rear Bumper 77745
 
1.9%
Left Rear Quarter Panel 71720
 
1.7%
Left Side Doors 71277
 
1.7%
Other values (9) 215447
 
5.1%
(Missing) 2601039
62.1%

Length

2024-05-06T23:22:50.859494image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
front 627711
13.7%
left 485232
10.6%
bumper 445832
9.8%
no 437596
9.6%
damage 437596
9.6%
right 415584
9.1%
quarter 320917
7.0%
panel 320917
7.0%
rear 288810
6.3%
end 212985
 
4.7%
Other values (10) 577645
12.6%

Most occurring characters

ValueCountFrequency (%)
2986115
12.0%
e 2897089
 
11.6%
r 2386084
 
9.6%
t 2088300
 
8.4%
a 1874100
 
7.5%
n 1378134
 
5.5%
o 1340076
 
5.4%
m 886239
 
3.6%
g 855512
 
3.4%
u 767953
 
3.1%
Other values (24) 7446398
29.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 24906000
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
2986115
12.0%
e 2897089
 
11.6%
r 2386084
 
9.6%
t 2088300
 
8.4%
a 1874100
 
7.5%
n 1378134
 
5.5%
o 1340076
 
5.4%
m 886239
 
3.6%
g 855512
 
3.4%
u 767953
 
3.1%
Other values (24) 7446398
29.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 24906000
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
2986115
12.0%
e 2897089
 
11.6%
r 2386084
 
9.6%
t 2088300
 
8.4%
a 1874100
 
7.5%
n 1378134
 
5.5%
o 1340076
 
5.4%
m 886239
 
3.6%
g 855512
 
3.4%
u 767953
 
3.1%
Other values (24) 7446398
29.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 24906000
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
2986115
12.0%
e 2897089
 
11.6%
r 2386084
 
9.6%
t 2088300
 
8.4%
a 1874100
 
7.5%
n 1378134
 
5.5%
o 1340076
 
5.4%
m 886239
 
3.6%
g 855512
 
3.4%
u 767953
 
3.1%
Other values (24) 7446398
29.9%

VEHICLE_DAMAGE_2
Categorical

MISSING 

Distinct19
Distinct (%)< 0.1%
Missing2991845
Missing (%)71.5%
Memory size263.1 MiB
No Damage
563895 
Right Front Bumper
119786 
Left Front Bumper
71209 
Center Front End
59219 
Left Rear Bumper
58982 
Other values (14)
320813 

Length

Max length25
Median length24
Mean length13.734143
Min length4

Characters and Unicode

Total characters16397248
Distinct characters34
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo Damage
2nd rowLeft Rear Bumper
3rd rowRight Front Bumper
4th rowNo Damage
5th rowNo Damage

Common Values

ValueCountFrequency (%)
No Damage 563895
 
13.5%
Right Front Bumper 119786
 
2.9%
Left Front Bumper 71209
 
1.7%
Center Front End 59219
 
1.4%
Left Rear Bumper 58982
 
1.4%
Left Front Quarter Panel 44918
 
1.1%
Right Rear Bumper 42469
 
1.0%
Right Front Quarter Panel 39256
 
0.9%
Left Rear Quarter Panel 37599
 
0.9%
Right Rear Quarter Panel 37364
 
0.9%
Other values (9) 119207
 
2.8%
(Missing) 2991845
71.5%

Length

2024-05-06T23:22:50.903443image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
no 563895
18.1%
damage 563895
18.1%
front 334388
10.7%
bumper 292446
9.4%
right 264548
8.5%
left 242613
7.8%
rear 176414
 
5.7%
quarter 159137
 
5.1%
panel 159137
 
5.1%
end 90685
 
2.9%
Other values (10) 265470
8.5%

Most occurring characters

ValueCountFrequency (%)
1918724
11.7%
e 1866194
11.4%
a 1658426
 
10.1%
r 1302128
 
7.9%
t 1118109
 
6.8%
o 1013831
 
6.2%
m 858137
 
5.2%
g 830594
 
5.1%
n 677838
 
4.1%
D 621269
 
3.8%
Other values (24) 4531998
27.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 16397248
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1918724
11.7%
e 1866194
11.4%
a 1658426
 
10.1%
r 1302128
 
7.9%
t 1118109
 
6.8%
o 1013831
 
6.2%
m 858137
 
5.2%
g 830594
 
5.1%
n 677838
 
4.1%
D 621269
 
3.8%
Other values (24) 4531998
27.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 16397248
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1918724
11.7%
e 1866194
11.4%
a 1658426
 
10.1%
r 1302128
 
7.9%
t 1118109
 
6.8%
o 1013831
 
6.2%
m 858137
 
5.2%
g 830594
 
5.1%
n 677838
 
4.1%
D 621269
 
3.8%
Other values (24) 4531998
27.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 16397248
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1918724
11.7%
e 1866194
11.4%
a 1658426
 
10.1%
r 1302128
 
7.9%
t 1118109
 
6.8%
o 1013831
 
6.2%
m 858137
 
5.2%
g 830594
 
5.1%
n 677838
 
4.1%
D 621269
 
3.8%
Other values (24) 4531998
27.6%

VEHICLE_DAMAGE_3
Categorical

IMBALANCE  MISSING 

Distinct19
Distinct (%)< 0.1%
Missing3270248
Missing (%)78.1%
Memory size259.3 MiB
No Damage
650342 
Center Front End
 
31656
Other
 
31361
Right Front Bumper
 
26206
Left Front Bumper
 
25271
Other values (14)
150665 

Length

Max length25
Median length9
Mean length11.32324
Min length4

Characters and Unicode

Total characters10366438
Distinct characters34
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo Damage
2nd rowNo Damage
3rd rowNo Damage
4th rowNo Damage
5th rowNo Damage

Common Values

ValueCountFrequency (%)
No Damage 650342
 
15.5%
Center Front End 31656
 
0.8%
Other 31361
 
0.7%
Right Front Bumper 26206
 
0.6%
Left Front Bumper 25271
 
0.6%
Left Front Quarter Panel 24371
 
0.6%
Right Front Quarter Panel 21325
 
0.5%
Center Back End 17508
 
0.4%
Left Rear Quarter Panel 15533
 
0.4%
Left Rear Bumper 15177
 
0.4%
Other values (9) 56751
 
1.4%
(Missing) 3270248
78.1%

Length

2024-05-06T23:22:50.947731image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
no 650342
31.1%
damage 650342
31.1%
front 128829
 
6.2%
left 92468
 
4.4%
right 83620
 
4.0%
bumper 78476
 
3.8%
quarter 74879
 
3.6%
panel 74879
 
3.6%
rear 56182
 
2.7%
end 49164
 
2.4%
Other values (10) 152045
 
7.3%

Most occurring characters

ValueCountFrequency (%)
a 1531471
14.8%
e 1193939
11.5%
1175725
11.3%
o 829832
8.0%
g 737549
 
7.1%
m 731377
 
7.1%
D 675634
 
6.5%
N 650342
 
6.3%
r 529428
 
5.1%
t 461238
 
4.4%
Other values (24) 1849903
17.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 10366438
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 1531471
14.8%
e 1193939
11.5%
1175725
11.3%
o 829832
8.0%
g 737549
 
7.1%
m 731377
 
7.1%
D 675634
 
6.5%
N 650342
 
6.3%
r 529428
 
5.1%
t 461238
 
4.4%
Other values (24) 1849903
17.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 10366438
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 1531471
14.8%
e 1193939
11.5%
1175725
11.3%
o 829832
8.0%
g 737549
 
7.1%
m 731377
 
7.1%
D 675634
 
6.5%
N 650342
 
6.3%
r 529428
 
5.1%
t 461238
 
4.4%
Other values (24) 1849903
17.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 10366438
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 1531471
14.8%
e 1193939
11.5%
1175725
11.3%
o 829832
8.0%
g 737549
 
7.1%
m 731377
 
7.1%
D 675634
 
6.5%
N 650342
 
6.3%
r 529428
 
5.1%
t 461238
 
4.4%
Other values (24) 1849903
17.8%

PUBLIC_PROPERTY_DAMAGE
Categorical

IMBALANCE  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing1528858
Missing (%)36.5%
Memory size243.3 MiB
N
2323722 
Unspecified
317977 
Y
 
15192

Length

Max length11
Median length1
Mean length2.1968011
Min length1

Characters and Unicode

Total characters5836661
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowN
2nd rowN
3rd rowN
4th rowN
5th rowN

Common Values

ValueCountFrequency (%)
N 2323722
55.5%
Unspecified 317977
 
7.6%
Y 15192
 
0.4%
(Missing) 1528858
36.5%

Length

2024-05-06T23:22:50.989722image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-06T23:22:51.024497image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
n 2323722
87.5%
unspecified 317977
 
12.0%
y 15192
 
0.6%

Most occurring characters

ValueCountFrequency (%)
N 2323722
39.8%
e 635954
 
10.9%
i 635954
 
10.9%
U 317977
 
5.4%
n 317977
 
5.4%
s 317977
 
5.4%
p 317977
 
5.4%
c 317977
 
5.4%
f 317977
 
5.4%
d 317977
 
5.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 5836661
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
N 2323722
39.8%
e 635954
 
10.9%
i 635954
 
10.9%
U 317977
 
5.4%
n 317977
 
5.4%
s 317977
 
5.4%
p 317977
 
5.4%
c 317977
 
5.4%
f 317977
 
5.4%
d 317977
 
5.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 5836661
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
N 2323722
39.8%
e 635954
 
10.9%
i 635954
 
10.9%
U 317977
 
5.4%
n 317977
 
5.4%
s 317977
 
5.4%
p 317977
 
5.4%
c 317977
 
5.4%
f 317977
 
5.4%
d 317977
 
5.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 5836661
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
N 2323722
39.8%
e 635954
 
10.9%
i 635954
 
10.9%
U 317977
 
5.4%
n 317977
 
5.4%
s 317977
 
5.4%
p 317977
 
5.4%
c 317977
 
5.4%
f 317977
 
5.4%
d 317977
 
5.4%
Distinct19198
Distinct (%)73.2%
Missing4159532
Missing (%)99.4%
Memory size129.3 MiB
2024-05-06T23:22:51.134731image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length866
Median length383
Mean length38.39852
Min length1

Characters and Unicode

Total characters1006694
Distinct characters61
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18215 ?
Unique (%)69.5%

Sample

1st rowUTILITY POLE
2nd rowPASSENGER FRONT SIDE DAMAGED
3rd rowFENCE OF A SCHOOL IN THE BACK
4th rowPOWERLINE CABLES IN FRONT OF 4236 BEDFORD AVENUE
5th rowBRICK FENCE WAS STRUCK BY MV1 WHEN TRYING TO PARK.
ValueCountFrequency (%)
fence 6179
 
3.6%
of 5954
 
3.4%
and 4795
 
2.8%
to 4379
 
2.5%
the 3906
 
2.2%
pole 3753
 
2.2%
damage 3383
 
1.9%
front 2954
 
1.7%
vehicle 2698
 
1.6%
light 2477
 
1.4%
Other values (11801) 133124
76.7%
2024-05-06T23:22:51.328368image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
147385
14.6%
E 98251
 
9.8%
A 68999
 
6.9%
T 65545
 
6.5%
O 62299
 
6.2%
N 61591
 
6.1%
I 55333
 
5.5%
R 52986
 
5.3%
D 42379
 
4.2%
L 40232
 
4.0%
Other values (51) 311694
31.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1006694
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
147385
14.6%
E 98251
 
9.8%
A 68999
 
6.9%
T 65545
 
6.5%
O 62299
 
6.2%
N 61591
 
6.1%
I 55333
 
5.5%
R 52986
 
5.3%
D 42379
 
4.2%
L 40232
 
4.0%
Other values (51) 311694
31.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1006694
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
147385
14.6%
E 98251
 
9.8%
A 68999
 
6.9%
T 65545
 
6.5%
O 62299
 
6.2%
N 61591
 
6.1%
I 55333
 
5.5%
R 52986
 
5.3%
D 42379
 
4.2%
L 40232
 
4.0%
Other values (51) 311694
31.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1006694
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
147385
14.6%
E 98251
 
9.8%
A 68999
 
6.9%
T 65545
 
6.5%
O 62299
 
6.2%
N 61591
 
6.1%
I 55333
 
5.5%
R 52986
 
5.3%
D 42379
 
4.2%
L 40232
 
4.0%
Other values (51) 311694
31.0%

CONTRIBUTING_FACTOR_1
Text

MISSING 

Distinct61
Distinct (%)< 0.1%
Missing148303
Missing (%)3.5%
Memory size286.8 MiB
2024-05-06T23:22:51.416473image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length53
Median length11
Mean length16.314718
Min length1

Characters and Unicode

Total characters65869792
Distinct characters55
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUnspecified
2nd rowDriver Inattention/Distraction
3rd rowDriver Inattention/Distraction
4th rowUnspecified
5th rowOther Vehicular
ValueCountFrequency (%)
unspecified 2372696
36.4%
driver 554052
 
8.5%
inattention/distraction 514304
 
7.9%
too 193911
 
3.0%
closely 193911
 
3.0%
to 171857
 
2.6%
failure 149257
 
2.3%
yield 142195
 
2.2%
right-of-way 142195
 
2.2%
following 133086
 
2.0%
Other values (96) 1955174
30.0%
2024-05-06T23:22:51.550212image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 8576868
13.0%
e 8049142
 
12.2%
n 5785046
 
8.8%
s 4067067
 
6.2%
t 3458998
 
5.3%
c 3426454
 
5.2%
r 2949697
 
4.5%
o 2880880
 
4.4%
d 2852972
 
4.3%
f 2800665
 
4.3%
Other values (45) 21022003
31.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 65869792
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
i 8576868
13.0%
e 8049142
 
12.2%
n 5785046
 
8.8%
s 4067067
 
6.2%
t 3458998
 
5.3%
c 3426454
 
5.2%
r 2949697
 
4.5%
o 2880880
 
4.4%
d 2852972
 
4.3%
f 2800665
 
4.3%
Other values (45) 21022003
31.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 65869792
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
i 8576868
13.0%
e 8049142
 
12.2%
n 5785046
 
8.8%
s 4067067
 
6.2%
t 3458998
 
5.3%
c 3426454
 
5.2%
r 2949697
 
4.5%
o 2880880
 
4.4%
d 2852972
 
4.3%
f 2800665
 
4.3%
Other values (45) 21022003
31.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 65869792
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
i 8576868
13.0%
e 8049142
 
12.2%
n 5785046
 
8.8%
s 4067067
 
6.2%
t 3458998
 
5.3%
c 3426454
 
5.2%
r 2949697
 
4.5%
o 2880880
 
4.4%
d 2852972
 
4.3%
f 2800665
 
4.3%
Other values (45) 21022003
31.9%

CONTRIBUTING_FACTOR_2
Text

MISSING 

Distinct56
Distinct (%)< 0.1%
Missing1688054
Missing (%)40.3%
Memory size220.0 MiB
2024-05-06T23:22:51.625439image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length53
Median length11
Mean length13.748036
Min length1

Characters and Unicode

Total characters34338402
Distinct characters53
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUnspecified
2nd rowUnsafe Lane Changing
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified
ValueCountFrequency (%)
unspecified 1958940
57.9%
driver 171806
 
5.1%
inattention/distraction 139153
 
4.1%
too 85262
 
2.5%
closely 85262
 
2.5%
lane 61730
 
1.8%
passing 59541
 
1.8%
following 58590
 
1.7%
unsafe 55755
 
1.6%
to 51168
 
1.5%
Other values (94) 654367
 
19.4%
2024-05-06T23:22:51.748481image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 5109609
14.9%
i 5058714
14.7%
n 3051663
8.9%
s 2501830
 
7.3%
c 2247984
 
6.5%
p 2148389
 
6.3%
d 2118406
 
6.2%
f 2115093
 
6.2%
U 2068417
 
6.0%
r 938728
 
2.7%
Other values (43) 6979569
20.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 34338402
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 5109609
14.9%
i 5058714
14.7%
n 3051663
8.9%
s 2501830
 
7.3%
c 2247984
 
6.5%
p 2148389
 
6.3%
d 2118406
 
6.2%
f 2115093
 
6.2%
U 2068417
 
6.0%
r 938728
 
2.7%
Other values (43) 6979569
20.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 34338402
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 5109609
14.9%
i 5058714
14.7%
n 3051663
8.9%
s 2501830
 
7.3%
c 2247984
 
6.5%
p 2148389
 
6.3%
d 2118406
 
6.2%
f 2115093
 
6.2%
U 2068417
 
6.0%
r 938728
 
2.7%
Other values (43) 6979569
20.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 34338402
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 5109609
14.9%
i 5058714
14.7%
n 3051663
8.9%
s 2501830
 
7.3%
c 2247984
 
6.5%
p 2148389
 
6.3%
d 2118406
 
6.2%
f 2115093
 
6.2%
U 2068417
 
6.0%
r 938728
 
2.7%
Other values (43) 6979569
20.3%

Interactions

2024-05-06T23:22:17.865850image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:22:15.574743image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:22:16.501670image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:22:17.245146image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:22:18.028124image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:22:15.882954image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:22:16.734834image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:22:17.406149image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:22:18.178038image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:22:16.056105image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:22:16.905689image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:22:17.552073image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:22:18.324434image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:22:16.243470image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:22:17.087388image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:22:17.701084image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Missing values

2024-05-06T23:22:19.828837image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-06T23:22:24.942046image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

UNIQUE_IDCOLLISION_IDCRASH_DATECRASH_TIMEVEHICLE_IDSTATE_REGISTRATIONVEHICLE_TYPEVEHICLE_MAKEVEHICLE_MODELVEHICLE_YEARTRAVEL_DIRECTIONVEHICLE_OCCUPANTSDRIVER_SEXDRIVER_LICENSE_STATUSDRIVER_LICENSE_JURISDICTIONPRE_CRASHPOINT_OF_IMPACTVEHICLE_DAMAGEVEHICLE_DAMAGE_1VEHICLE_DAMAGE_2VEHICLE_DAMAGE_3PUBLIC_PROPERTY_DAMAGEPUBLIC_PROPERTY_DAMAGE_TYPECONTRIBUTING_FACTOR_1CONTRIBUTING_FACTOR_2
01038578010020109/07/20129:031NYPASSENGER VEHICLENaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNUnspecifiedNaN
119140702421308209/23/20198:150553ab4d-9500-4cba-8d98-f4d7f89d5856NYStation Wagon/Sport Utility VehicleTOYT -CAR/SUVNaN2002.0North1.0MLicensedNYGoing Straight AheadLeft Front BumperLeft Front Quarter PanelNaNNaNNaNNNaNDriver Inattention/DistractionUnspecified
214887647330760810/02/201517:182NYTAXINaNNaNNaNNaNNaNNaNNaNNaNGoing Straight AheadNaNNaNNaNNaNNaNNaNNaNDriver Inattention/DistractionNaN
314889754330869310/04/201520:341NYPASSENGER VEHICLENaNNaNNaNNaNNaNNaNNaNNaNParkedNaNNaNNaNNaNNaNNaNNaNUnspecifiedNaN
41440027029766604/25/201321:151NYPASSENGER VEHICLENaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNOther VehicularNaN
517044639343415505/02/201617:35219456NY4 dr sedanMERZ -CAR/SUVNaN2015.0East2.0MLicensedFLMergingRight Front BumperRight Front BumperRight Front Quarter PanelNaNNaNNNaNDriver Inattention/DistractionUnsafe Lane Changing
619138701422906710/24/201913:15c53b43d9-419a-4ab1-9361-3f2979078d89NYBusFRHT-TRUCK/BUSNaN2006.0East13.0MLicensedNYParkedLeft Front Quarter PanelLeft Front Quarter PanelNaNNaNNaNNNaNUnspecifiedUnspecified
717303317350302708/18/201612:39672828NYStation Wagon/Sport Utility VehicleFORD -CAR/SUVNaN2005.0Southwest2.0FLicensedNYGoing Straight AheadCenter Front EndCenter Front EndNo DamageNo DamageNo DamageNNaNDriver Inattention/DistractionUnspecified
81225453619642507/16/201311:201NYPASSENGER VEHICLENaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNUnspecifiedNaN
911804847297589711/26/201218:122NYPASSENGER VEHICLENaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNDriver Inattention/DistractionNaN
UNIQUE_IDCOLLISION_IDCRASH_DATECRASH_TIMEVEHICLE_IDSTATE_REGISTRATIONVEHICLE_TYPEVEHICLE_MAKEVEHICLE_MODELVEHICLE_YEARTRAVEL_DIRECTIONVEHICLE_OCCUPANTSDRIVER_SEXDRIVER_LICENSE_STATUSDRIVER_LICENSE_JURISDICTIONPRE_CRASHPOINT_OF_IMPACTVEHICLE_DAMAGEVEHICLE_DAMAGE_1VEHICLE_DAMAGE_2VEHICLE_DAMAGE_3PUBLIC_PROPERTY_DAMAGEPUBLIC_PROPERTY_DAMAGE_TYPECONTRIBUTING_FACTOR_1CONTRIBUTING_FACTOR_2
418573920644490472201205/03/202415:45f230e065-53a4-48e2-8de3-5b583563a05bNYSedanHOND -CAR/SUVNaN2015.0West2.0MLicensedNYGoing Straight AheadLeft Side DoorsLeft Side DoorsLeft Rear Quarter PanelLeft Front Quarter PanelNaNNNaNUnspecifiedUnspecified
418574020642932472128505/01/202415:1322263e4c-35fe-4fc4-88c0-b5cbef767b42NYSedanTOYT -CAR/SUVNaN2022.0East1.0MLicensedNYBackingOtherOtherOtherNo DamageNo DamageNNaNUnspecifiedUnspecified
418574120642623472130805/01/202414:0506c02285-3d71-43f9-a91f-f4959f63e4abNYSedanHOND -CAR/SUVNaN2008.0North1.0MLicensedNYGoing Straight AheadCenter Front EndCenter Front EndLeft Front BumperRight Front BumperNaNNNaNUnspecifiedUnspecified
418574220644037472183305/03/20248:0019237a84-aae5-4c05-a6a7-b9cb3176f698NJVanNaNNaN2022.0East2.0MPermitNYStarting in TrafficLeft Front BumperNo DamageNo DamageNaNNaNNNaNUnspecifiedUnspecified
418574320643070472141605/01/202416:20001e5437-a4e6-492c-a180-1a6b30a104c3NYStation Wagon/Sport Utility VehicleTOYT -CAR/SUVNaN2022.0East1.0MLicensedNYMaking U TurnCenter Front EndCenter Front EndNaNNaNNaNNNaNTurning ImproperlyUnspecified
418574420643801472165805/02/202419:009b543b1f-5f75-4e2d-81fb-6e41bf199abeNYStation Wagon/Sport Utility VehicleHOND -CAR/SUVNaN2023.0West1.0MLicensedNYGoing Straight AheadLeft Front BumperNo DamageNo DamageNo DamageNo DamageNNaNUnspecifiedUnspecified
418574520643500472171905/01/202419:4540b07e63-d0c5-484c-b4ad-b0c5c14a025aNYSedanCHEV -CAR/SUVNaN2019.0South1.0MLicensedNYMaking Right TurnLeft Front Quarter PanelLeft Front BumperNo DamageNo DamageNo DamageNNaNDriver Inattention/DistractionUnspecified
418574620644164472184205/03/20246:19f75830c9-55a6-4045-91ae-8482f12da8eaNYSedanTOYT -CAR/SUVNaN1996.0North1.0MLicensedNYGoing Straight AheadCenter Front EndCenter Front EndRight Front BumperNo DamageNo DamageNNaNUnspecifiedUnspecified
418574720643871472156704/28/202420:4530a07aba-062d-4626-8f52-e1606d87ca22NaNGARBAGE TRNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNUnspecifiedNaNNaNNaN
418574820644806472196105/03/202419:34a00495ce-4978-4ab4-9fa8-012545ed51d3NYStation Wagon/Sport Utility VehicleJEEP -CAR/SUVNaN2005.0South1.0MLicensedNYGoing Straight AheadCenter Front EndCenter Front EndNaNNaNNaNNNaNFollowing Too CloselyUnspecified